Automatic Scoring of Shadowing Speech Based on DNN Posteriors and Their DTW
نویسندگان
چکیده
Shadowing has become a well-known method to improve learners’ overall proficiency. Our previous studies realized automatic scoring of shadowing speech using HMM phoneme posteriors, called GOP (Goodness of Pronunciation) and learners’ TOEIC scores were predicted adequately. In this study, we enhance our studies from multiple angles: 1) a much larger amount of shadowing speech is collected, 2) manual scoring of these utterances is done by two native teachers, 3) DNN posteriors are introduced instead of HMM ones, 4) language-independent shadowing assessment based on posteriors-based DTW (Dynamic Time Warping) is examined. Experiments suggest that, compared to HMM, DNN can improve teacher-machine correlation largely by 0.37 and DTW based on DNN posteriors shows as high correlation as 0.74 even when posterior calculation is done using a different language from the target language of learning.
منابع مشابه
Calibration of Phone Likelihoods in Automatic Speech Recognition
In this paper we study the probabilistic properties of the posteriors in a speech recognition system that uses a deep neural network (DNN) for acoustic modeling. We do this by reducing Kaldi’s DNN shared pdf-id posteriors to phone likelihoods, and using test set forced alignments to evaluate these using a calibration sensitive metric. Individual frame posteriors are in principle well-calibrated...
متن کاملContent Normalization for Text-independent Speaker Verification
In the past few years, Deep Neural Network (DNN) based ivector Speaker Verification (SV) systems have shown to provide state-of-the-art performance. However, error rates increase drastically for short duration recordings. In this paper, we improve the i-vector approach for short utterances, (i) by using smoothed DNN posteriors for i-vector extraction, and (ii) by normalizing the content of the ...
متن کاملLow-rank Representation of Nearest Neighbor Phone Posterior Probabilities to Enhance Dnn Acoustic Modeling
We hypothesize that optimal deep neural networks (DNN) class-conditional posterior probabilities live in a union of lowdimensional subspaces. In real test conditions, DNN posteriors encode uncertainties which can be regarded as a superposition of unstructured sparse noise over the optimal posteriors. We aim to investigate different ways to structure the DNN outputs by exploiting low-rank repres...
متن کاملLow-rank Representation for Enhanced Deep Neural Network Acoustic Models
Automatic speech recognition (ASR) is a fascinating area of research towards realizing humanmachine interactions. After more than 30 years of exploitation of Gaussian Mixture Models (GMMs), state-of-the-art systems currently rely on Deep Neural Network (DNN) to estimate class-conditional posterior probabilities. The posterior probabilities are used for acoustic modeling in hidden Markov models ...
متن کاملLow-Rank Representation of Nearest Neighbor Posterior Probabilities to Enhance DNN Based Acoustic Modeling
We hypothesize that optimal deep neural networks (DNN) class-conditional posterior probabilities live in a union of lowdimensional subspaces. In real test conditions, DNN posteriors encode uncertainties which can be regarded as a superposition of unstructured sparse noise over the optimal posteriors. We aim to investigate different ways to structure the DNN outputs by exploiting low-rank repres...
متن کامل